An automated signalized junction controller that learns strategies by temporal difference reinforcement learning

نویسندگان

  • Simon Box
  • Ben Waterson
چکیده

This paper shows how temporal difference learning can be used to build a signalized junction controller that will learn its own strategies though experience. Simulation tests detailed here show that the learned strategies can have high performance. This work builds upon previous work where a neural network based junction controller that can learn strategies from a human expert was developed (Box and Waterson, 2012). In the simulations presented, vehicles are assumed to be broadcasting their position over WiFi giving the junction controller rich information. The vehicle’s position data are pre-processed to describe a simplified state. The state-space is classified into regions associated with junction control decisions using a neural network. This classification is the strategy and is parametrized by the weights of the neural network. The weights can be learned either through supervised learning with a human trainer or reinforcement learning by temporal difference (TD). Tests on a model of an isolated T junction show an average delay of 14.12 s and 14.36 s respectively for the human trained and TD trained networks. Tests on a model of a pair of closely spaced junctions show 17.44 s and 20.82 s respectively. Both methods of training produced strategies that were approximately equivalent in their equitable treatment of vehicles, defined here as the variance over the journey time distributions.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An automated signalized junction controller that learns strategies from a human expert

An automated signalized junction control system that can learn strategies from a human expert has been developed. This system applies Machine Learning techniques based on Logistic Regression and Neural Networks to affect a classification of state space using evidence data generated when a human expert controls a simulated junction. The state space is constructed from a series of bids from agent...

متن کامل

Reinforcement learning based feedback control of tumor growth by limiting maximum chemo-drug dose using fuzzy logic

In this paper, a model-free reinforcement learning-based controller is designed to extract a treatment protocol because the design of a model-based controller is complex due to the highly nonlinear dynamics of cancer. The Q-learning algorithm is used to develop an optimal controller for cancer chemotherapy drug dosing. In the Q-learning algorithm, each entry of the Q-table is updated using data...

متن کامل

Control of Multivariable Systems Based on Emotional Temporal Difference Learning Controller

One of the most important issues that we face in controlling delayed systems and non-minimum phase systems is to fulfill objective orientations simultaneously and in the best way possible. In this paper proposing a new method, an objective orientation is presented for controlling multi-objective systems. The principles of this method is based an emotional temporal difference learning, and has a...

متن کامل

Learning Through Interaction

Reinforcement learning is an approach for learning optimal action policy via experiencing, i.e. using observed reward in environment states. Reinforcement learning algorithms include adaptive dynamic programming, temporal difference learning and Q-learning[1]. Examples of successful applications of reinforcement learning are controller for sustained inverted flight on an autonomous helicopter [...

متن کامل

Bayesian Reinforcement Learning with Behavioral Feedback

In the standard reinforcement learning setting, the agent learns optimal policy solely from state transitions and rewards from the environment. We consider an extended setting where a trainer additionally provides feedback on the actions executed by the agent. This requires appropriately incorporating the feedback, even when the feedback is not necessarily accurate. In this paper, we present a ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Eng. Appl. of AI

دوره 26  شماره 

صفحات  -

تاریخ انتشار 2013